Re: [SQL] optimizing 2-table join w/millions of rows - Mailing list pgsql-sql

From Herouth Maoz
Subject Re: [SQL] optimizing 2-table join w/millions of rows
Date
Msg-id l03110700b27f0a10d780@[147.233.159.109]
Whole thread Raw
In response to Re: [SQL] optimizing 2-table join w/millions of rows  (Michael Olivier <molivier@yahoo.com>)
List pgsql-sql
At 5:05 +0200 on 23/11/98, Michael Olivier wrote:


>
> > How about:
> >
> > SELECT U.acctname
> > FROM   usertest U, bgndtest B
> > WHERE  B.acctname = U.acctname
> >   AND  B.part_needed=3 AND B.loc_needed=5
> >   AND  U.part=2 AND U.loc=3;
>
> Yes, that looks equivalent. My problem is this is too slow an
> operation as I'm benchmarking it right now. And if I add less-than or
> greater-than comparisons, the performance goes _way_ down from there.
> How can I get the best performance out of this kind of operation?
>
> Is there any way to force postgres to hold certain tables in memory
> all the time? As I said, cost of memory isn't an issue, but
> performance is.

This is very strange. As a general rule, a simple join should always be
faster than a query with a subquery. Especially when that subquery is "in"
and not "exists".

In principle, if there is an index on the part, loc and acctname (separate
indices) in usertest, and on part_needed, loc_needed and acctname in
bgndtest, the most optimal query should be: restrict each of the tables by
the literal comparisons, and then match the acctname.

I don't know who wrote the optimiser, and I'm not sure he is on this list.
There are optimizer settings which can be changed, such as SET R_PLANS ON.

In any case, if you see in "real life" that a subquery actually works
better for you, then avoid the join. No reason to do both:

SELECT acctname
FROM bgndtest
WHERE part_needed=3 and loc_needed=5
  AND acctname IN (
      SELECT acctname
      FROM usertest
      WHERE part=2 AND loc=3
  );

Or better yet:

SELECT acctname
FROM bgndtest B
WHERE part_needed=3 and loc_needed=5
  AND EXISTS (
      SELECT *
      FROM usertest U
      WHERE part=2 AND loc=3
        AND U.acctname = B.acctname
  );

Of course, you may do it the other way around. It depends on which of your
tables is supposed to have more records, and which of the comparisons is
actually a "=" comparison, which is very important, because less-than and
greater-than don't use the indices.

What I mean by "the other way around" is:

SELECT acctname
FROM usertest U
WHERE part=2 and loc=3
  AND EXISTS (
      SELECT *
      FROM bgndtest B
      WHERE part_needed=3 and loc_needed=5
        AND U.acctname = B.acctname
  );

As for your question - there is no way to keep an entire table in memory,
unless you have a ram disk.

Herouth

--
Herouth Maoz, Internet developer.
Open University of Israel - Telem project
http://telem.openu.ac.il/~herutma



pgsql-sql by date:

Previous
From: Michael Olivier
Date:
Subject: Re: [SQL] optimizing 2-table join w/millions of rows
Next
From: Sferacarta Software
Date:
Subject: Re[2]: [SQL] abs value, power, sqrt